This paper proposes a novel and robust voice activity detection (VAD) algorithm utilizing long-term spectral flatness\r\nmeasure (LSFM) which is capable of working at 10 dB and lower signal-to-noise ratios(SNRs). This new LSFM-based\r\nVAD improves speech detection robustness in various noisy environments by employing a low-variance spectrum\r\nestimate and an adaptive threshold. The discriminative power of the new LSFM feature is shown by conducting an\r\nanalysis of the speech/non-speech LSFM distributions. The proposed algorithm was evaluated under 12 types of\r\nnoises (11 from NOISEX-92 and speech-shaped noise) and five types of SNR in core TIMIT test corpus. Comparisons\r\nwith three modern standardized algorithms (ETSI adaptive multi-rate (AMR) options AMR1 and AMR2 and ITU-T G.729)\r\ndemonstrate that our proposed LSFM-based VAD scheme achieved the best average accuracy rate. A long-termsignal\r\nvariability (LTSV)-based VAD scheme is also compared with our proposed method. The results show that our\r\nproposed algorithm outperforms the LTSV-based VAD scheme for most of the noises considered including difficult\r\nnoises like machine gun noise and speech babble noise.
Loading....